Effect of the Environment on Speech Statistics
نویسندگان
چکیده
This paper describes a series of cepstral-based compensation procedures that render the SPHINX-II continuous speech recognition system more robust with respect to acoustical changes in the environment. The first two algorithms, SNR based MultivaRiate gAussian based cepsTral normaliZation (SNR-based RATZ) and STAtistical Reestimation of HMMs (STAR), compensate for environmental degradation based on comparisons of simultaneouslyrecorded data in the training and testing environments (“stereo data”). They differ in that RATZ modifies the incoming feature vectors to a recognition system while STAR modifies the internal representation of speech by the system. We also describe N-CDCN, an improved version of codeword-dependent cepstral normalization (CDCN) which does not require stereo training data but nevertheless achieves performance levels comparable to RATZ and other algorithms that require stereo training. Use of these compensation algorithms significantly reduces the error rates for SPHINX-II. The algorithms are tested in a variety of databases and environmental conditions. INTRODUCTION Robustness with respect to environmental variability remains a continuing problem for speech recognition technology (e.g. [3]). For example, the use of microphones other than the ARPA standard Sennheiser HM-414 headset (CLSTLK) severely degrades the performance of speech recognition systems like the SPHINX-II, even in relatively quiet environments [1, 4]. Traditional algorithms to compensate for environmental variation have either relied on the availability of simultaneously-recorded data in the training and testing environments (“stereo data”), or have utilized structural models to define the degradation. For example, multiple fixed codeword-dependent cepstral normalization (MFCDCN) [4] uses stereo data to compute correction vectors to compensate for the effects of the environment. Dual-channel codebook adaptation (DCCA) [4], on the other hand, modifies the statistical representation used in the HMMs for speech on the basis of comparisons obtained from stereo data. The other approach to environmental compensation is through the use of structural models of degradation. On the other hand, codeword-dependant cepstral normalization (CDCN) [1] assumes that speech is degraded by unknown additive noise and unknown linear filtering. It makes use of expectationmaximization (EM) techniques to determine the parameters characterizing these distortions. In this paper we describe three new cepstral-domain compensation strategies, SNR based MultivaRiate gAussian based cepsTral normaliZation (SNR-based RATZ), STAtistical Reestimation of HMMs (STAR), and new CDCN (NCDCN). SNR-based RATZ and STAR both make use of stereo data. They differ in that RATZ modifies the incoming feature vectors to a recognition system while STAR modifies the internal representation of speech by the system. RATZ and STAR are similar in philosophy to MFCDCN and DCCA, respectively [4], but they achieve improved performance through the use of better mathematical models which introduce strong structural constraints into the assumed distribution for speech. N-CDCN is a modification and improvement of the original CDCN algorithm, which is based on a structural model of degradation. EFFECT OF THE ENVIRONMENT ON SPEECH STATISTICS In this section we describe how even well-behaved environments, such as those that can be modeled by unknown linear filtering and additive stationary noise, modify the statistics of “clean” speech in very unpredictable ways. Even though we can formulate equations that analytically describe how the pdfs of clean speech change, the solutions for these equations are mathematically intractable. For analytical purposes, we adopt the simple model of degradation proposed by Acero [1]. In this model, degraded speech is characterized by passing high-quality clean speech through a linear filter and contaminating the filtered output by additive stationary noise. For simplicity, we will also assume that the feature vector is unidimensional, APPROACHES TO ENVIRONMENT COMPENSATION IN AUTOMATIC SPEECH RECOGNITION Pedro J. Moreno, Bhiksha Raj, Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, U.S.A. although all conclusions developed can be easily extended to an arbitrary N-dimensional space such as the log spectral domain. The degraded speech can be characterized as: (1) where represents the power spectrum of the degraded speech, is the power spectrum of the clean speech, is the transfer function of the linear filter, and is the power spectrum of the additive noise. In the logspectral domain this relation can be expressed as:
منابع مشابه
The effect of redesign workstation on Speech Interference Level (SIL) among bank tellers
Abstract Background: There is always an interaction between man and his environment that can be the cause of physical, physiological and psychological stress on people and also cause discomfort, annoyance, and have direct and indirect effects on their performance and productivity, health and safety. People in their workplace are exposed to many factors related to work activities and environmen...
متن کاملEffect of Aromatherapy with Lavender 10% Essential Oil on Motor Function, Speech and Delirium in Patients with Acute Thrombotic Cerebral Ischemia
Background: Stroke is one of the most disabling diseases worldwide. Herbal medicines, especially lavender, have been used to treat ischemic diseases today. Objectives: The aim of our study was to investigate the effect of aromatherapy with lavender 10% essential oil on motor function, speech and delirium in acute thrombotic cerebral ischemia patients. Materials & Methods: In this double bli...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملErgonomic Effect on Job Satisfaction of the employed Administrative staffs in projects of Iran Gas Engineering and Development Company
Introduction: To investigate the effect of ergonomic environment of the work environment on the job satisfaction of the personnel employed in the projects of Iranian Gas Engineering and Development Company. Methods: The present study is a descriptive-correlational research that was conducted by survey method. This research is also applied in terms of purpose. The statistical population of the ...
متن کاملتجزیه پایداری عملکرد ژنوتیپهای گلرنگ (Carthamus tinctorius L.)
The selection efficiency of the most desirable safflower genotypes can be improved by incorporating the graphical methods and statistical analysis. This experiment was carried out to determine grain yield stability of safflower genotypes using the graphical and statistical methods. Twenty safflower genotypes were evaluated in Chachsaran, Choram, Behbehan and Dehdasht using randomized complete b...
متن کاملA Challenging Issue in the Etiology of Speech Problems: The Effect of Maternal Exposure to Electromagnetic Fields on Speech Problems in the Offspring
Background: Nowadays, mothers are continuously exposed to different sources of electromagnetic fields before and even during pregnancy. It has recently been shown that exposure to mobile phone radiation during pregnancy may lead to adverse effects on the brain development in offspring and cause hyperactivity. Researchers have shown that behavioral problems in laboratory animals which have a s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995